Dev by zTgx · Pull Request #105 · vectorlessflow/vectorless

zTgx · 2026-04-22T07:26:37Z

Summary

Changes

Checklist

Code compiles (cargo build)
Tests pass (cargo test --lib --all-features)
No new clippy warnings (cargo clippy --all-features)
Public APIs have documentation comments
Python bindings updated (if Rust API changed)

Notes

- Add detailed README.md documenting the full high-level Vectorless Python API - Implement main.py demonstrating Session and SyncSession classes usage - Cover 10 key topics including session creation, indexing, querying, and metrics - Include sample documents for architecture, finance, and security domains - Provide examples for both async and sync APIs with proper error handling - Demonstrate event callbacks, document management, and streaming queries

…ve tests Implement proper UTF-8 character boundary handling when truncating long feedback strings to prevent panics with multi-byte characters like emojis and Chinese characters. Replace unsafe byte-based slicing with ceil_char_boundary method and add extensive test coverage for various UTF-8 scenarios including ASCII, multi-byte characters, emojis, and short strings. fix(engine): improve document loading error handling during graph rebuild Enhance error reporting and handling when loading documents for graph rebuilding by tracking individual document failures, adding detailed warnings for missing or inaccessible documents, and providing more granular statistics about successful vs failed loads. refactor(understand): improve JSON parsing robustness and error messages Update JSON extraction from LLM responses to properly handle code fences with language tags and missing closing fences. Add detailed warning logs for parsing failures and fix edge cases where JSON keys start with fence identifier letters ('j', 's', 'o', 'n'). fix(dedup): handle None document names correctly in evidence deduplication Ensure proper deduplication when Evidence objects have None for doc_name by using "_unknown" placeholder, preventing incorrect deduplication between documents with explicit names and those without. perf(cache): optimize cache performance with VecDeque and poison recovery Replace Vec with VecDeque for O(1) LRU eviction operations, reducing cache maintenance overhead. Add poison lock recovery mechanism to maintain cache availability when worker threads panic, preventing silent failures and ensuring continued operation with stale data instead of blocking access.

Add comprehensive test example that indexes a realistic technical document and asks complex questions requiring deep reasoning across document sections, demonstrating the engine's capability beyond simple keyword matching. fix: update README tagline by removing synthesis reference Remove "Exact, not synthesized" from the header tagline in README.md to better align with current project focus and messaging. refactor: enhance logging across core engine components Add detailed logging information throughout the engine including: - Evidence evaluation counts in orchestrator - Replanning evidence metrics - Navigation planning and rounds tracking - Index persistence status updates - Query understanding initiation - Dispatcher operation flow Replace generic log messages with structured logging containing relevant context like document names, round numbers, and operation metrics for better debugging and monitoring.

- Configure tracing subscriber with environment filter support, allowing log level to be controlled via RUST_LOG environment variable - Add document resolution count logging to track query processing - Add document loading statistics showing loaded and failed counts

…vigation Support relative paths with "/" separator in cd command (e.g., "Research Labs/Lab B") alongside existing absolute paths. Update navigation prompts to clarify path support including both relative paths like "Section/Sub" and absolute paths like "/root/Section". Add comprehensive tests for relative path navigation scenarios including success cases and partial failure handling. refactor(index): extract keywords from full content instead of samples Always extract keywords from full node content rather than falling back to content samples when summaries are empty. This ensures more comprehensive keyword coverage across documents. feat(query): enhance query understanding with detailed logging Include key concepts, strategy hints, and rewritten queries in understanding logs for better debugging and visibility into query processing decisions. feat(search): add content snippets to search results for relevance Include content snippets around matching keywords in search results to help users judge relevance. Add new content_snippet utility function that extracts context-aware text fragments centered on keywords with configurable length limits and proper UTF-8 boundary handling. Apply this enhancement to find_cross, worker execution, and planning components.

…ality - Increase max_rounds from 8 to 15 and max_llm_calls from 15 to 25 - Update find command to support multi-word searches and provide better fallback behavior for title matching - Enhance search strategy documentation with navigation efficiency guidelines - Update all test cases to reflect new max_rounds value of 15 - Improve find command output to include content snippets when available

Add BFS-based deep search functionality to resolve_target_extended that searches up to 4 levels deep for matching node titles. The new search hierarchy prioritizes: 1) Direct children via NavigationIndex, 2) Direct children via TreeNode titles, and 3) Deep descendant search with breadth-first traversal. Also include comprehensive test coverage for the new deep search functionality. refactor(agent): improve evidence formatting with content previews Replace character count displays with actual content excerpts in evidence summaries for both evaluation and replanning phases. Content is truncated to 500 characters to maintain manageable prompt sizes. Update format_evidence_summary and format_evidence_context functions to show meaningful content previews instead of just character counts. feat(agent): track collected nodes separately from visited nodes Introduce collected_nodes HashSet to distinguish between nodes that have been visited during navigation versus nodes whose content has been specifically collected via cat operations. Add has_evidence_for method to check collection status and evidence_for_check method to provide content-excerpt based evidence summaries for sufficiency checks.

- Remove character limit truncation from evidence content in evaluation - Allow full content to be available for LLM assessment of relevance - Increase MAX_FEEDBACK_CHARS from 500 to 2000 to prevent prompt bloat while maintaining useful context fix(logging): add compact formatting to tracing subscriber

- Create new supervisor module to encapsulate the dispatch → evaluate → replan logic - Replace inline supervisor loop implementation with call to run_supervisor_loop function - Add SupervisorOutcome struct to return iteration count, evaluation sufficiency status, and LLM call counts - Maintain same functionality while improving code organization and testability refactor(worker): extract navigation loop into separate module - Move navigation loop logic from worker module to new navigation module - Replace inline navigation loop with run_navigation_loop function call - Split complex navigation logic into smaller helper functions for building prompts, handling parsing failures, and managing replanning - Improve code organization and maintainability feat(tools): remove content truncation in cat tool - Remove character limit and truncation logic from cat tool output - Return full content string instead of truncated preview - This allows complete evidence collection without size limitations

…nippet logic BREAKING CHANGE: Removed MAX_FEEDBACK_CHARS constant and automatic truncation in set_feedback method. Feedback will now be stored as-is without size limitations. - Moved content_snippet function to tools module for shared usage - Updated all references to use the centralized content_snippet function - Increased snippet length from 150/120 to 300 characters for better context - Replaced character limit checks with entry count limits in planning - Added MAX_PLAN_ENTRIES (15), MAX_SECTION_SUMMARIES (10), and MAX_EXPANSION_ENTRIES (8) constants for better control over prompt size - Removed content preview truncation in grep tool

vercel · 2026-04-22T07:26:42Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
vectorless	Ready	Preview, Comment	Apr 22, 2026 7:26am

zTgx added 10 commits April 22, 2026 08:33

zTgx merged commit fb28c3a into main Apr 22, 2026
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev#105

Dev#105
zTgx merged 10 commits intomainfrom
dev

zTgx commented Apr 22, 2026

Uh oh!

vercel Bot commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zTgx commented Apr 22, 2026

Summary

Changes

Checklist

Notes

Uh oh!

vercel Bot commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant